[ETL-631] Update index fields with `ParticipantIdentifier` and propagate participant id fields #110

philerooski · 2024-04-10T21:59:56Z

Update index fields
If parent parquet dataset has ParticipantID field, propagate that to the child
Small change to the FitbitEcg schema to include the export start and end date fields

And a small update to FitbitEcg table schema that should have been included when we originally added this data type.

sonarcloud · 2024-04-10T22:00:16Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

BryanFauble

LGTM!

thomasyu888 · 2024-04-11T00:11:32Z

src/glue/jobs/json_to_parquet.py

-    "garminhrvsummary": ["ParticipantID", "StartTimeInSeconds"],
-    "garminmanuallyupdatedactivitysummary": ["ParticipantID", "SummaryId"],
-    "garminmoveiqactivitysummary": ["ParticipantID", "SummaryId"],
-    "garminpulseoxsummary": ["ParticipantID", "SummaryId"],


Was ParticipantID the wrong key?

There's a one-to-one mapping from ParticipantIdentifier to ParticipantID. They are two different ways of representing the same identifier.

ParticipantIdentifier is the only field present in every data type (including SymptomLog) and the CE docs suggest it's the more "official" identifier -- kind of like HealthCode to ExternalId in mPower.

thomasyu888

🔥 Thanks for the quick work here!

rxu17 · 2024-04-11T01:01:24Z

src/glue/jobs/json_to_parquet.py

-        ).distinct()
+        index_fields = INDEX_FIELD_MAP[table_data_type]
+        additional_fields = [selectable_original_field_name, "cohort"]
+        if "ParticipantID" in parent_table.columns:


Should we modify /add a new test for this function to check this part that ParticipantID gets included in additional_fields if it exists? I was reading the JIRA ticket/slack thread - why would both ParticipantID and ParticipantIdentifier exist in a dataset?

Ahh, yes we should. I'm always forgetting about tests 🤦

why would both ParticipantID and ParticipantIdentifier exist in a dataset?

I don't know the exact reason, but it's not unusual to have one be the "global" identifier and the other to be a study or app specific identifier.

That's not confusing at all :D

rxu17 · 2024-04-11T01:03:04Z

src/glue/jobs/json_to_parquet.py

-                + INDEX_FIELD_MAP[table_data_type]
-            )
-        ).distinct()
+        index_fields = INDEX_FIELD_MAP[table_data_type]


Given that now Participant_Identifier is a required index field for every data type (as far as I can tell in INDEX_FIELD_MAP), I'm thinking maybe we could have a test to ensure that Participant_Identifier is in all the key-value pairs in the dict of INDEX_FIELD_MAP. This is more of future proofing/double checking that we don't modify code that would accidentally affect this. Thoughts?

I don't want to assume that ParticipantIdentifier will be included with every data type going forward. That's a CE decision, and why I explicitly included the field in the INDEX_FIELD_MAP rather than specifying it once in this function.

rxu17

LGTM! Thanks for looking into this! Just a few comments.

Update index fields with ParticipantIdentifier

29b4588

And a small update to FitbitEcg table schema that should have been included when we originally added this data type.

philerooski requested a review from a team as a code owner April 10, 2024 21:59

philerooski temporarily deployed to develop April 10, 2024 22:18 — with GitHub Actions Inactive

philerooski temporarily deployed to develop April 10, 2024 22:21 — with GitHub Actions Inactive

BryanFauble approved these changes Apr 10, 2024

View reviewed changes

thomasyu888 reviewed Apr 11, 2024

View reviewed changes

thomasyu888 approved these changes Apr 11, 2024

View reviewed changes

rxu17 reviewed Apr 11, 2024

View reviewed changes

rxu17 approved these changes Apr 11, 2024

View reviewed changes

philerooski merged commit f312f6a into main Apr 11, 2024
15 checks passed

philerooski deleted the etl-631 branch April 11, 2024 19:16

rxu17 mentioned this pull request Jun 14, 2024

[ETL-648] Modify comparsion job to only consider records from recent exports #119

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ETL-631] Update index fields with `ParticipantIdentifier` and propagate participant id fields #110

[ETL-631] Update index fields with `ParticipantIdentifier` and propagate participant id fields #110

philerooski commented Apr 10, 2024

sonarcloud bot commented Apr 10, 2024

BryanFauble left a comment

thomasyu888 Apr 11, 2024

philerooski Apr 11, 2024

thomasyu888 left a comment

rxu17 Apr 11, 2024 •

edited

Loading

philerooski Apr 11, 2024

philerooski Apr 11, 2024

rxu17 Apr 11, 2024

rxu17 Apr 11, 2024 •

edited

Loading

philerooski Apr 11, 2024

rxu17 left a comment

[ETL-631] Update index fields with ParticipantIdentifier and propagate participant id fields #110

[ETL-631] Update index fields with ParticipantIdentifier and propagate participant id fields #110

Conversation

philerooski commented Apr 10, 2024

sonarcloud bot commented Apr 10, 2024

Quality Gate passed

BryanFauble left a comment

Choose a reason for hiding this comment

thomasyu888 Apr 11, 2024

Choose a reason for hiding this comment

philerooski Apr 11, 2024

Choose a reason for hiding this comment

thomasyu888 left a comment

Choose a reason for hiding this comment

rxu17 Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

philerooski Apr 11, 2024

Choose a reason for hiding this comment

philerooski Apr 11, 2024

Choose a reason for hiding this comment

rxu17 Apr 11, 2024

Choose a reason for hiding this comment

rxu17 Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

philerooski Apr 11, 2024

Choose a reason for hiding this comment

rxu17 left a comment

Choose a reason for hiding this comment

[ETL-631] Update index fields with `ParticipantIdentifier` and propagate participant id fields #110

[ETL-631] Update index fields with `ParticipantIdentifier` and propagate participant id fields #110

rxu17 Apr 11, 2024 •

edited

Loading

rxu17 Apr 11, 2024 •

edited

Loading